Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Oct 16, 2025

Description

The chart command returns an aggregation result in a two-dimension table format.

Work items:

  • support span
  • support limit, limit=top x, limit=bottom x
  • support useother, otherstr
  • correct limit behavior with non-accumulative aggregation functions (min, max, avg, etc) // fixed in Fix timechart OTHER category aggregation for non-cumulative functions #4594
  • support usenull, nullstr
  • support non-string fields as column split
  • add integration tests
  • add explain tests
  • add a doc
  • Add a brief walk-through of the implementation
  • Anonymizer & test

Related Issues

Resolves #399

Implementation Walk-through

Ideally, chart should pivot the result into a 2-dimension table. E.g. for the following table:

a b val
m x 3
m y 4

| chart avg(val) by a, b should make it a table like this:

a x y
m 3 4

However, it seems dynamic pivoting is not supported in SQL/Calcite (see original discussion in #3965 (comment)). Therefore, the result table for the implementedchart is like:

a b avg(val)
m x 3
m y 4

The pivoting can be performed in the front-end.

The above operation is equivalent to stats avg(val) by a, b -- this is the case when parameters like usenull, useother, and limit is not involved in the result.

When these parameters are involved, chart command will find the top-N categories of b, aggregating the rest to an OTHER category, and aggregating those whose b is null to a "NULL" category. This leads to the following implementation:

  1. normal aggregation based on a, b (equivalent to stats agg_func by a, b)
  2. find out the top-N categories (unique values of column b) by aggregating on the above aggregation results
    1. aggregate on b
    2. sort on aggregation results
    3. number the rows
  3. left join the ranked results with the original aggregation
  4. keep rows whose row number is no greater than the limit, categorizing the rest to OTHER or NULL
  5. Aggregate again because values categorized into OTHER or NULL need to be merged

Note:

This implementation did not reuse the implementation of timechart to circumvent some existing bugs. A following PR will merge their implementation as chart essentially is a superset of timechart in terms of functionality.

Future work items

  • support multiple aggregation functions (Left as a TODO in the future: the output will be messy when multiple aggregations are involved because the results are not pivoted.)
  • unify implementation of timechart and chart
  • support more bin options like bins (after Fix bins on time-related fields #4612 )

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuancu yuancu added the feature label Oct 16, 2025
@yuancu yuancu force-pushed the issues/399 branch 2 times, most recently from 8297023 to 6b8934e Compare October 24, 2025 06:12
@yuancu yuancu marked this pull request as ready for review October 24, 2025 08:56
@yuancu yuancu marked this pull request as draft October 28, 2025 14:38
@yuancu yuancu marked this pull request as ready for review October 29, 2025 01:58
@yuancu yuancu changed the title WIP: Support chart command in PPL Support chart command in PPL Oct 29, 2025
@yuancu yuancu force-pushed the issues/399 branch 2 times, most recently from db25ef7 to 86b4cb3 Compare October 29, 2025 12:04
yuancu added 23 commits October 30, 2025 10:26
Signed-off-by: Yuanchun Shen <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Signed-off-by: Yuanchun Shen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] PPL Chart Command

1 participant